29. Screencast: Model Diagnostics in Python - Part I

Model Diagnostics In Python

At the end of the video, the wikipedia documentation is actually the opposite way of sklearn's output. The predicted is across the columns, and the actual is across the rows. Therefore,

Predicted
Actual 0 1
0 23 1
1 14 2
  • Therefore, there are 23 non-admitted that we predict to be non-admitted.
  • There are 14 admitted that we predicted to be non-admitted.
  • There is 1 non-admitted that we predict to be admitted.
  • There are 2 admitted that we predict to be admitted.

Why Train-Test Split And Additional Documentation

Here is the documentation for logistic regression sklearn. Additionally, here is the documentation for working with confusion matrices.

In this screencast, you created a train and test dataset, which is a very common approach in machine learning. Here is a useful resource for exploring the rationale and process for splitting your data into train and test sets. In general, it is useful to split your data into training and testing data to assure your model can predict well not only with the data it was fit to, but also on data that the model has never seen before. Proving the model performs well on test data assures that you have a model that will do well in the future use cases - whether that be future students, future transactions, or any other future predictions you might want to make.